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ABSTRACT Engaging large numbers of undergraduates in authentic scientific discovery is desirable but difficult to achieve. We 
have developed a general model in which faculty and teaching assistants from diverse academic institutions are trained to teach a 
research course for first-year undergraduate students focused on bacteriophage discovery and genomics. The course is situated 
within a broader scientific context aimed at understanding viral diversity, such that faculty and students are collaborators with 
established researchers in the field. The Howard Hughes Medical Institute (HHMI) Science Education Alliance Phage Hunters 
Advancing Genomics and Evolutionary Science (SEA-PHAGES) course has been widely implemented and has been taken by over 
4,800 students at 73 institutions. We show here that this alliance-sourced model not only substantially advances the field of 
phage genomics but also stimulates students' interest in science, positively influences academic achievement, and enhances per- 
sistence in science, technology, engineering, and mathematics (STEM) disciplines. Broad application of this model by integrating 
other research areas with large numbers of early-career undergraduate students has the potential to be transformative in science 
education and research training. 

IMPORTANCE Engagement of undergraduate students in scientific research at early stages in their careers presents an opportunity 
to excite students about science, technology, engineering, and mathematics (STEM) disciplines and promote continued interests 
in these areas. Many excellent course-based undergraduate research experiences have been developed, but scaling these to a 
broader impact with larger numbers of students is challenging. The Howard Hughes Medical Institute (HHMI) Science Educa- 
tion Alliance Phage Hunting Advancing Genomics and Evolutionary Science (SEA-PHAGES) program takes advantage of the 
huge size and diversity of the bacteriophage population to engage students in discovery of new viruses, genome annotation, and 
comparative genomics, with strong impacts on bacteriophage research, increased persistence in STEM fields, and student self- 
identification with learning gains, motivation, attitude, and career aspirations. 
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In 2012, the President's Council of Advisors on Science and 
Technology (PCAST) reported that there is a need for an addi- 
tional one million science, technology, engineering, and mathe- 
matics (STEM) graduates in the United States over the next de- 
cade to meet U.S. economic demands ( 1 ) . It was noted that even a 
modest increase in the persistence of STEM students in the first 
2 years of their undergraduate education would alleviate much of 
this shortfall (1). Replacing conventional introductory laboratory 
courses with discovery-based research courses is a key recommen- 
dation that is expected to lead to enhanced retention. Providing 
authentic research experiences to undergraduate students and di- 
recting them toward careers in STEM is a priority of science edu- 
cation in the 21st century (1-4). 

An abundance of evidence shows that involvement of under- 
graduate students in authentic research experiences has strong 
benefits for their engagement and interest in science (5-7) and 
that this often increases student interest in STEM careers (8). It is 
common for undergraduate students at research colleges and uni- 
versities to participate in faculty-led research programs — espe- 
cially during their last 2 years — with graduate students and post- 
doctoral researchers participating in their mentorship (9). 
Research experiences promote college retention (10), but the ca- 
pacity for high-quality mentored undergraduate research within 
faculty research programs is limited, and this route is unlikely 
alone to satisfy the economic demands of the coming decade. 
There have been many successful efforts to develop classroom 
undergraduate research experiences (11-14; see also http://www 
.sciencemag.org/site/special/ibi/ and http://www.curenet.org/), 
but identifying authentic research experiences that scale to larger 
numbers of undergraduate students often proves elusive (4). 
Bioinformatic approaches engaging substantial numbers of stu- 
dents at diverse institutions have been described (15, 16) and are 
successful in providing research experiences (14) but do not in- 
clude a wet-bench laboratory component. 

Taking advantage of research infrastructures at research- 
intensive institutions to advance missions in undergraduate edu- 
cation is desirable, and community-oriented approaches have 
been developed (17, 18), although the potential is largely un- 
tapped. Some research projects are likely to be more suitable for 
undergraduate involvement than others, and identifying those 
both rich in discovery and accessible to early-career students is 
challenging (19). The Phage Hunters Integrating Research and 
Education (PHIRE) program, in which undergraduate and high 
school students isolate novel bacteriophages, sequence their ge- 
nomes, annotate them, and analyze them from a comparative 
genomics perspective, is one response to this challenge (19-21). 
The approach takes advantage of the large, dynamic, old, and 
highly genetically diverse nature of the bacteriophage population 
(22, 23). Moreover, although phages play key roles in bacterial 
pathogenesis (24) and the global climate and ecology (25), we 
know remarkably little about them outside a few well-studied pro- 
totypes. 

Phages can be easily isolated from the environment, and their 
relatively small genomes (40 to 150 kbp) are readily sequenced 
and annotated (26). Phage isolation requires little prior expert 
knowledge or technical skill, providing an accessible entry point 
for students from all backgrounds to engage in inquiry-based sci- 
ence (21). Each isolated phage is new, students can name their 
own phage, and a sense of ownership in their discovery helps to 
motivate them to explore the secrets of their phage by isolating 



genomic DNA, determining its sequence, annotating gene predic- 
tions, and comparing the sequence to that of other known viruses 
(21). This programmatic transition from a broadly accessible and 
concrete introduction to sophisticated genomic analysis provides 
a rich and structured education platform (27), applicable to STEM 
and non-STEM students, including first-year undergraduates 
(28-30). 

To investigate whether the PHIRE approach can be extended 
to environments beyond the expert phage-focused research labo- 
ratory, the Howard Hughes Medical Institute (HHMI), the Uni- 
versity of Pittsburgh, and James Madison University investigated a 
framework enabling broad usage at diverse institutions, involving 
large numbers of undergraduate students and nonexpert instruc- 
tors, and assessed its impact. The approach proved to be scalable 
(4,800 students at 73 schools over 5 years), it was implementable 
at research-intensive and research-poor institutions, generated 
gains in phage biology research, and enhanced student retention, 
and the student-reported gains were equivalent to those from an 
intense summer research experience. 

RESULTS 

The attributes of the PHIRE program at the University of Pitts- 
burgh demonstrate that phage discovery and genomics are a plat- 
form that supports engagement of students in authentic research 
without requiring prior mastery of anything other than very basic 
concepts and content material (21). We therefore examined 
whether this could be broadly implemented at institutions with a 
wide spectrum of missions and demographics, without a require- 
ment for resident expertise in bacteriophage biology. Our core 
hypothesis was that student participation in this research would 
generate new insights into phage diversity and evolution while 
simultaneously elevating student engagement in science, stimu- 
lating overall academic performance, and encouraging persistence 
in STEM fields. Below, we report the structure of the HHMI Sci- 
ence Education Alliance Phage Hunters Advancing Genomics and 
Evolutionary Science (SEA-PHAGES) course and its impacts on 
both research advances and student learning. 

The SEA-PHAGES course. The SEA-PHAGES course (for- 
merly called the National Genomics Research Initiative) is a year- 
long research experience targeted at beginning college students. 
Classes typically enroll 18 to 24 students and are taught by one or 
two faculty members together with a student teaching assistant. In 
the first term, students isolate phages from locally collected soil 
samples using Mycobacterium smegmatis as the primary bacterial 
host, a nonpathogenic strain relevant to understanding Mycobac- 
terium tuberculosis. Students purify and characterize their phages, 
visualize them with electron microscopy, and extract and purify 
the DNA. The genome of one phage isolate is sequenced between 
terms, and in the second term, students annotate the genome us- 
ing bioinformatics tools to define putative genes, understand 
genomic arrangements, and predict protein functions. Sequence 
and annotation quality is expertly reviewed and collated on the 
PhagesDB database (http://www.phagesdb.org) and submitted to 
GenBank. The Phamerator program (31) is used to explore ge- 
nome relationships, and all phage samples are archived for use by 
the research community. 

The SEA-PHAGES course curriculum aims to introduce stu- 
dents to research methods and approaches, experimental design, 
and data interpretation but does not seek to instruct students in 
content matter outside the immediate biological context. But, as 
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TABLE 1 Diversity of institutions participating in SEA-PHAGES 



Carnegie classification" 


No. of schools 


Research universities; very high/high research activity 


30 


Master's degree-granting colleges and universities 


18 


Baccalaureate colleges 


22 


Associate's degree-granting colleges 


3 



rt Schools offering the SEA-PHAGES course are organized according to their 
classification by the Carnegie Foundation for the Advancement of Teaching (2010). 



students are direct participants in scientific discovery, the goal is 
to engage, excite, increase the confidence of, and draw students 
into a cycle of self- motivation. If successful, we predicted that this 
would translate into enhanced performance in other STEM 
classes, greater retention within STEM training, and an increase in 
the numbers of students seeking continued research experiences 
beyond their freshman year. 

Program faculty and teaching assistants are trained at two 
weeklong workshops, one for each term of the course. Detailed 
manuals are provided, and community discussions are facilitated 
by a wiki site. Students and faculty present their findings at an 
annual SEA-PHAGES Research Symposium, at regional and na- 
tional meetings, and through peer-reviewed publications. In the 
5 years of the program, more than 4,800 students have partici- 
pated (1,800 in 2012-2013), including STEM majors, non-STEM 
majors, honors students, and "typical" students. The number of 
participating schools has grown to more than 70 institutions (see 
Table SI in the supplemental material), ranging from community 



colleges to research universities (Table 1). As can be seen from 
these program design features, the educational model of the SEA- 
PHAGES program integrates course-based learning within a 
framework of scientific activity, including a real- world scientific 
research agenda, professional networking, and scientific dissemi- 
nation of results. In this way, the cost-effectiveness of course- 
based learning is combined with professional science with mutual 
benefits. 

Gains in understanding viral diversity. The contributions of 
the SEA-PHAGES students have been essential to our current un- 
derstanding of the diversity of mycobacteriophages, demonstrat- 
ing the substantial impact of the distributed approach compared 
to what would be accomplished by a single laboratory, and have 
resulted in several publications with student authors (29, 31-39). 
Since the start of the program in 2008, SEA-PHAGES students 
have isolated 3,000 new phages (with global positioning system 
[GPS] coordinates recorded) and characterized their phages by 
DNA restriction analysis and electron microscopy. More than 450 
mycobacteriophage genomes have been sequenced and anno- 
tated, and more than 350 sequences have been deposited in Gen- 
Bank (Fig. 1). These genomes include many distinctly different 
types and numerous complex variants (40), and the entire genome 
collection codes for over 48,000 genes representing 3,780 se- 
quence phamilies (a group of proteins sharing similarity to at least 
one other above threshold BlastP and Clustal values [31]). Corre- 
lations between genome and geography or time of isolation have 
been explored (35, 41), as well as the evolutionary mechanisms 
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FIG 1 SEA-PHAGES students contribute to scientific knowledge. Results are from the first 5 years of the SEA-PHAGES program isolating new phages, showing 
the cumulative numbers of phages isolated (blue), cumulative numbers of genomes sequenced (orange), cumulative numbers of gene phamilies (purple), and 
total numbers of mycobacteriophages in GenBank (green). Not all genomes sequenced and annotated in year 5 are yet available in GenBank. 
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FIG 2 Student evaluation of learning gains. Mean learning gains for common survey items on the SURE (green diamonds), CURE (blue squares), and the 
SEA-PHAGES (red triangles) assessment instruments are shown. The SURE survey data represent 2,358 students who completed summer research in 2009; the 
CURE survey data represent 476 students who were enrolled in science courses that were described by their instructors as without a research element (data 
collected for fall 2007 through spring 2009); the SEA-PHAGES data represent 121 students who evaluated their course following the academic year 2008-2009. 
Error bars represent 2 standard errors around the mean. 



contributing to the pervasive genome mosaicism (33). The ge- 
nomes contain numerous examples of biological intrigue, includ- 
ing novel inteins, introns, mobile elements, immunity systems, 
and regulatory schemes (33-35, 42-45), as well as potential for 
developing new tools for understanding tuberculosis (46-49). 

The diversity of phages known to infect a single common host 
is remarkable; there are many thousands of potential bacterial 
hosts for phage isolation, and host range studies suggest that sim- 
ply using a different strain of the same bacterial species will result 
in distinct profiles of diversity (38). With an estimated 10 31 phage 
particles in the biosphere and a population that turns over every 
few days (23), there is an inexhaustible reservoir for discovery. 

Impacts on student education and retention. The Survey of 
Undergraduate Research Experience (SURE) and the Classroom 
Undergraduate Research Experience (CURE) measure the stu- 
dents' assessment of their understanding of science and scientists, 
confidence in their ability to perform research, and their perceived 
gains in skills (50). The self-perceptions of learning gains, moti- 
vation and attitude, and career aspirations of the SEA-PHAGES 
course participants were assessed with pre- and postcourse SURE- 
like surveys (see Fig. SI in the supplemental material). Twenty of 
the SEA-PHAGES survey items are shared with the regular SURE 
and CURE surveys, allowing the comparison of the SEA-PHAGES 
students' learning gains with those of students who engaged in a 
dedicated summer research experience (SURE) and students who 
completed traditional science courses with no research element 
(CURE) (Fig. 2). The SEA-PHAGES students scored as well as or 
better on all 20 learning gains compared to the SURE students, 
reflecting benefits at least equivalent to those accrued through a 
summer-long apprentice-based undergraduate research experi- 
ence. The increase in scientific self-efficacy reported by the SEA- 



PHAGES students is likely to be directly related to their retention 
in science (51). 

To analyze the effect of the SEA-PHAGES course on student 
persistence, we compared retention of students enrolled in the 
SEA-PHAGES course (77% first-year students and 95% STEM 
majors) with two benchmark statistics: the retention of all stu- 
dents and the retention of STEM majors with the same number of 
years of college experience and enrolled at the same school 
(Fig. 3A), important parameters given the typical rates for student 
attrition between first- and second-year STEM undergraduates 
(52). Data were from 27 comparisons from 20 institutions and 
show clearly that SEA-PHAGES students matriculated into the 
second year at significantly higher rates than did either benchmark 
group. Thus, early engagement in a research experience improves 
student retention into the second year. The positive impacts of 
this course-based research experience are similar to what has been 
reported for apprentice-based research experiences (5, 53), repre- 
sent an effective response to the call to action in the National 
Science Foundation (NSF) Vision and Change and PCAST reports 
(1, 4), and provide validation for this educational model on a 
larger scale. 

Anticipating that research-stimulated motivation will influ- 
ence student performance in other courses, we selected six schools 
that substituted the SEA-PHAGES course for a regular biology 
laboratory and compared the grades of participating students in 
the accompanying biology lecture course (Fig. 3B). We limited 
this analysis to schools that enrolled "typical" students into the 
PHAGES lab sections rather than those aimed at honors students 
or students at academic risk. The biology lecture course grades of 
SEA-PHAGES students were compared directly to those of peers 
enrolled in the same lecture course but in the regular biology 
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FIG 3 (A) Retention of SEA-PHAGES participants (red) compared to other 
students at the same institution (blue), year 1 to year 2 of their college experi- 
ence. Retention data were gathered from 20 institutions, with some institu- 
tions contributing data from multiple years, resulting in 27 sets of comparison 
data. Retention data were analyzed with a between-group analysis of variance 
with 3 levels of the independent variable (all majors, STEM majors, and SEA- 
PHAGES students) for 171 reports. The result was interpreted as significant at 
the 0.05 level. (B) SEA-PHAGES students (red) perform better than peers 
(blue) in traditional laboratory sections in the introductory lecture course. 
Results are for 127 SEA-PHAGES students and 1,120 students in the tradi- 
tional laboratory course from six institutions. In the lecture course, SEA- 
PHAGES students averaged 2.95 on a 4.0 scale, compared to the 2.58 average of 
students in traditional lab sections. This difference was significant (t = 2.64; P 
< 0.05). 

laboratory. As is the case with most applied research, students 
were not randomly assigned to conditions, and even among these 
"typical" students, there may have been some self-selection for 
registration in the SEA-PHAGES course. We observed substantial 
differences in both the average grades and the grade distribution 
of SEA-PHAGES students relative to those of students in tradi- 
tional lab sections (Fig. 3B), and although these data are prelimi- 
nary and warrant further study, they suggest that there could be 
broad educational benefits to the SEA-PHAGES experience. Be- 
cause of the concern that SEA-PHAGES students might suffer 
from lack of exposure to a broader coverage of subject matter in 
the regular laboratory course, we developed a 25-item pre- and 
postcourse survey of biological concepts (see Fig. S2 in the supple- 
mental material) which was administered to students before and 
after the laboratory courses. There was no significant difference in 
performances on the test between SEA-PHAGES students and the 
comparison group of students (see Fig. S3). Both groups im- 
proved from pretest to posttest, and there was no significant dif- 
ference between the groups in terms of the extent of their im- 
provement. The lack of exposure to additional topics in the SEA- 
PHAGES course thus had no obvious detrimental effect. 

DISCUSSION 

The HHMI SEA-PHAGES program provides a general model for 
accomplishing improvements in the persistence of students in sci- 
ence by transforming a small-scale scientific inquiry into a cross- 
institution education platform that engages first-year students. 
The outcomes are consistent and robust, benefitting diverse 
groups of students across a variety of institutions. The materials 
costs are similar to those of other inquiry-based courses, and 
many institutions have implemented the course without external 



support, other than assistance with sequencing costs and pro- 
grammatic and scientific support from HHMI and the University 
of Pittsburgh (some schools received direct external support for 
materials during their first 3 years in the program). The size and 
diversity of the phage population provide an inexhaustible wealth 
of biological novelty that imposes no obvious limit on the number 
of students who can participate. Future opportunities include fur- 
ther broadening the implementation of the SEA-PHAGES course 
as well as extending the model to development of similar projects 
in which scientific discovery, project ownership, and simple entry 
points can be implemented at the first-year college level. Meeting 
these opportunities will lead to a broad and sustainable enhance- 
ment of undergraduate science education, an advancement of sci- 
entific knowledge, and an increase of student persistence in sci- 
ence. 

MATERIALS AND METHODS 

Participants. The study was conducted with SEA-PHAGES faculty and 
students in the United States and the Commonwealth of Puerto Rico. 
David Lopatto and participant institutions obtained appropriate institu- 
tional review board (IRB) approval. SEA-PHAGES faculty are trained in a 
weeklong workshop focusing on in situ procedures and pedagogy in prep- 
aration for the fall semester and a weeklong workshop focusing on in silico 
bioinformatics tools in preparation for the spring semester. Faculty and 
students are invited to a SEA-PHAGES National Symposium to present 
their scientific findings. The SEA office conducts annual site visits and 
provides continuous technical support for institutions year-round. The 
SEA Wild maintains an up-to-date depository for announcements, com- 
munication forums for faculty and students, curriculum resources, in- 
structional materials, and research archives. SEA-PHAGES faculty mem- 
bers recruited comparison group students on a volunteer basis to enhance 
the validity of statistical analysis. The comparison group students were 
recruited among students taking introductory laboratory courses. Except 
for the student grade analysis, comparison group students cannot be 
matched to SEA-PHAGES students on each campus, so statistical analysis 
was limited to quasiexperimental analysis based on a nonequivalent com- 
parison group. Systemic Research sent out invitations to all consenting 
students' e-mail addresses individually. 

Analysis. During academic year 2009-2010, different aspects of the 
SEA-PHAGES and comparison group were measured. White/Caucasian 
students made up the majority of each group, 66% of SEA-PHAGES stu- 
dents and 76% of comparison group students. The majority of both 
groups lived in suburban communities (66% SEA-PHAGES and 64% 
comparison group students), attended public high schools (83% SEA- 
PHAGES and 83% comparison group students), and were in their first 
year in college (SEA-PHAGES, 77% first-year students, 18% sophomores; 
comparison group, 70% first-year students, 20% sophomores). There 
were a higher percentage of male students in the SEA-PHAGES course 
(38%) than in the comparison group (29%), but in both groups, female 
students were the clear majority. 

Retention rates. The Institutional Annual Survey measures student 
retention rates by tracking full-time, first-time entering students who are 
seeking bachelor's degrees. The Institutional Annual Survey was con- 
ducted among institutions during November to December. Retention 
rates were calculated for students returning in fall 2008 and fall 2009. An 
analysis of variance was performed over 3 groups (all majors, STEM ma- 
jors, and SEA-PHAGES students). The data were reported by institution 
and category, including 63 reports for all majors, 43 reports for STEM 
majors, and 65 reports for SEA-PHAGES students. 

The SEA CURE survey. The Classroom Undergraduate Research 
Experience (CURE) survey was specially adapted to the SEA-PHAGES 
program by David Lopatto (Grinnell College, Grinnell, IA). The CURE 
survey consists of multiple sections, including institution, class, demo- 
graphics, science-related activities, major and minor concentration, post- 
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graduate academic goals, experiences in laboratory course elements, ex- 
perience in research, engagement in activities or endeavors, course 
benefit, learning experience in laboratory experiments and tools, overall 
course evaluation, and opinions about science. Systemic Research added a 
few questions to the postcourse CURE survey to collect data regarding 
students' SEA-PHAGES course satisfaction, SEA Wiki access and utiliza- 
tion, SEA-PHAGES research paper and presentation experience, and gen- 
eral comments. The survey was administered twice a year: the presurvey at 
the beginning of the fall semester and the postsurvey at the end of the 
spring semester. As with the Biological Concepts Survey (BCS), Systemic 
Research developed the online survey forms using the Vovici EFM Com- 
munity Professional website. The pre- and postcourse survey invitations 
were e-mailed to individual students according to their academic calen- 
dars. Using Vovici's survey follow-up feature, three reminder e-mails 
were sent after the initial invitations. The collected survey responses were 
securely saved in a dedicated Vovici HHMI website and Systemic Re- 
search's NGRI student database. The SURE survey data represent 2,358 
students who completed summer research in 2009; the CURE survey data 
represent 476 students who evaluated science courses that were described 
by their instructors as without a research element (data collected fall 2007 
through spring 2009); the SEA-PHAGES data represent 121 students who 
evaluated their course following the academic year 2008-2009. Mean 
learning gains were calculated for each category of the 20 items common 
to both the CURE and SURE surveys. 

Grades. Eleven institutions submitted their SEA-PHAGES students' 
laboratory and introductory biology course performance data for fall 2008 
and spring 2009 in the academic year 2008-2009 and fall 2009 and spring 
2010 in the academic year 2009-2010. Letter grade distributions for both 
SEA-PHAGES and comparison students were collected. Six institutions 
had matched data that were utilized in the analysis, with 127 SEA- 
PHAGES and 1,120 comparison student grades. For statistical analysis, 
the letter grades were assigned numerical values from 4 (grade A) to 0 
(grade F). f tests were performed comparing the mean grades received by 
SEA-PHAGES students and comparison group students in the biology 
lecture course. 

Biological methods. Mycobacteriophage isolation was performed us- 
ing Mycobacterium smegmatis mc 2 155 as a host, and phages were identi- 
fied as PFU either by direct plating on bacterial lawns or after enrichment 
in the presence ofM. smegmatis. Following purification and amplification, 
DNA was isolated and sequenced using Sanger, 454, or Ion Torrent tech- 
nologies, using a shotgun approach followed by targeted sequencing to 
validate ambiguities and determine genome ends. Genome annotations 
were performed using various software platforms, including GBrowse 
(54), Apollo (55), DNAMaster (http://cobamide2.bio.pitt.edu/), Glim- 
mer (56), GeneMark (57), and analysis programs available at the National 
Center for Biotechnology Information (NCBI). Comparative genomics 
used Phamerator (31) and Gepard (58). Assembled genome sequences 
and genome annotations were subjected to expert review prior to submis- 
sion to GenBank. Detailed methods for phage isolation, sequencing, and 
analysis are available on PhagesDB (http://phagesdb.org). 

SUPPLEMENTAL MATERIAL 

Supplemental material for this article may be found at http://mbio.asm.org 
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